Why Corpus-Based Statistics-Oriented Machine Translation

نویسنده

Jing-Shin Chang

چکیده

Rule-based approaches have been the dominant paradigm in developing MT systems. Such approaches, however, suffer from difficulties in knowledge acquisition to meet the wide variety and time-changing characteristics of the real text. To attack this problem, some statistical translation models and supporting tools had been developed in the last few years. However, a simple statistical model often results in a large parameter space and thus requires a large training corpus. Therefore, it is required to introduce language models that take advantages of well-justified linguistic knowledge to make stochastic MT systems practical. A stochastic model that emphasizes the adoption of well-justified linguistic knowledge in developing the model is called a corpus-based statistics-oriented approach. In this paper, corpus-based statistics-oriented paradigm is proposed, its characteristics is compared with other methodologies. The recent progress in some corpus-based statistics-oriented models for MT are also reviewed.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Corpus-Based Statistics-Oriented (CBSO) Machine Translation Researches in Taiwan

A brief introduction to the MT research projects in Taiwan is given in this paper. Special attention is given to the more and more popular corpus-based statistics-oriented (CBSO) approaches in MT researches. In particular, the parameterized two-way training philosophy in designing the second generation BehaviorTran, which is the first and the largest operational system in this area, is introduc...

متن کامل

Improving the precision of automatically constructed human-oriented translation dictionaries

In this paper we address the problem of automatic acquisition of a human-oriented translation dictionary from a large-scale parallel corpus. The initial translation equivalents can be extracted with the help of the techniques and tools developed for the phrase-table construction in statistical machine translation. The acquired translation equivalents usually provide good lexicon coverage, but t...

متن کامل

Sub-Sentential Alignment Method by Analogy

This paper describes a method for searching word correspondences between pairs of translation sentences. In the Example-Based Machine Translation, translation patterns can be extracted easily if word correspondences between pair of translation sentences are defined. The popular methods for aligning bilingual corpus at a sub-sentential level are unable to produce reliable result when the size of...

متن کامل

Translation Ambiguity Resolution Based On Text Corpora Of Source And Target Languages

We propose a new method to resolve ambiguity in translation and meaning interpretation using linguistic statistics extracted from dual corpora of sourcu aud target languages in addition to tim logical restrictions described on dictiomtry and grammar rules for ambiguity resolution. It provides reasonable criteria for determining a suitable equivalent translation or meaning by making tile depende...

متن کامل

How to Avoid Burning Ducks: How to Avoid Burning Ducks: Combining Linguistic Analysis and Corpus Statistics for German Compound Processing

Compound splitting is an important problem in many NLP applications which must be solved in order to address issues of data sparsity. Previous work has shown that linguistic approaches for German compound splitting produce a correct splitting more often, but corpus-driven approaches work best for phrase-based statistical machine translation from German to English, a worrisome contradiction. We ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1992

Why Corpus-Based Statistics-Oriented Machine Translation

نویسنده

چکیده

منابع مشابه

Corpus-Based Statistics-Oriented (CBSO) Machine Translation Researches in Taiwan

Improving the precision of automatically constructed human-oriented translation dictionaries

Sub-Sentential Alignment Method by Analogy

Translation Ambiguity Resolution Based On Text Corpora Of Source And Target Languages

How to Avoid Burning Ducks: How to Avoid Burning Ducks: Combining Linguistic Analysis and Corpus Statistics for German Compound Processing

عنوان ژورنال:

اشتراک گذاری